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Abstract 

Wild species of Oryza are extremely valuable sources of genetic material that can be used to broaden the genetic 
background of cultivated rice, and to increase its resistance to abiotic and biotic stresses. Until recently, there was no 
sequence information for the BBCC Oryza genome; therefore, no special markers had been developed for this genome type. 
The lack of suitable markers made it difficult to search for valuable genes in the BBCC genome. The aim of this study was to 
develop microsatellite markers for the BBCC genome. We obtained 13,991 SSR-containing sequences and designed 14,508 
primer pairs. The most abundant was hexanuclelotide (31.39%), followed by trinucleotide (27.67%) and dinucleotide 
(19.04%). 600 markers were selected for validation in 23 accessions of Oryza species with the BBCC genome. A set of 495 
markers produced clear amplified fragments of the expected sizes. The average number of alleles per locus (A/o) was 2.5, 
ranging from 1 to 9. The genetic diversity per locus (He) ranged from 0 to 0.844 with a mean of 0.333. The mean 
polymorphism information content (PIC) was 0.290, and ranged from 0 to 0.825. Of the 495 markers, 1 2 were only found in 
the BB genome, 173 were unique to the CC genome, and 198 were also present in the AA genome. These microsatellite 
markers could be used to evaluate the phylogenetic relationships among different Oryza genomes, and to construct a 
genetic linkage map for locating and identifying valuable genes in the BBCC genome, and would also for marker-assisted 
breeding programs that included accessions with the AA genome, especially Oryza sativa. 
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Introduction 

The Oryza genus comprises more than 22 species with 10 
recognized genomic types, six of which are diploid genome sets 
(2n= 24, AA, BB, CC, EE, FF, and GG) and four of which are 
tetraploid (2n= 4x= 48, BBCC, CCDD, HHJJ, and HHKK) [1]. 
According to their genome constitution, species in this genus can 
be classified into four main complexes [2]: Orjza ridleyi complex 
(including the HHJJ genome); Oryza granulate complex (including 
the GG genome); Oryza ojjicinalis complex (including the BB, CC, 
BBCC, CCDD, and EE genomes); and Oryza sativa complex 
(including the AA genome). There are two cultivated Oryza species, 
referring to Oryza sativa and Oryza glanerrima. Asian cultivated rice 
{Oryza sativa) is one of the most important food crops in the world, 
and serves as a primary food source for more than half of the 
world's population [3]. In the field, cultivated rice plants are 
continuously damaged by various biotic and abiotic factors. The 
planting of modern varieties with resistance and/or tolerance 
genes is one of the best strategies to control pests in rice 
production. Some populations of wild species of Oryza have been 
identified as extremely valuable resources that can be used to 



broaden the genetic background of cultivated rice to increase its 
resistance to adverse factors. 

The BBCC Oryza genome (2n = 4x = 48) is characteristic of 
allotetraploid wild species with two homologous genomes, B and 
C. Three species have this genome type: Oryza malampuzhaensis, 
which is found in India; Oryza minuta, which is endemic to 
Philippines and Papua New Guinea; and Oryza punctata (tetraploid, 
2n = 48), which is widely distributed in Africa. The BBCC 
genome is related to the BB and CC genomes [1]. Only Oryza 
punctata (2n = 24) has the BB genome [4,5], while Oryza officinalis, 
Oryza rhizomatis and Oryza eichingeri have the CC genome. These 
species are regarded as donors of genes that promote resistance to 
rice blast, bacterial leaf blight, brown planthopper, and white 
backed planthopper [6,7]. 

However, the transfer of valuable genes from these wild species 
to Oryza sativa via crossing has been proved to be extremely 
difficult because of low seed set, hybrid sterility, and the lack of 
chromosome recombination [8]. There is no doubt that appro- 
priate gene identification technologies will promote the use of 
genetic material from these wild species. The traditional method to 
identify the genomes of Oiyza was to observe chromosome pairing 
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Table 1. 


The Statistics about the sequence assembly. 






Category 


Length (bp) 


Number 


sum 


480,470,380 


225,833 


ave 


2,128 




largest 


41,615 




N50 


2,329 


65,627 


N90 


1,203 


182,019 



doi:l 0.1 371 /journal.pone.0091 826.t001 



behavior at meiotic metaphase-I in interspecific hybrids [9,10]. 
However, this process was afTected by genetic and environmental 
factors [11,12]. Subsequendy, genomic in situ hybridization 
(GISH) was used to identify genomes [13], followed by multicolor 
genomic in situ hybridization (McGISH), an improved method that 
used two different genomic probes simultaneously [14]. Both 
GISH and McGISH were complex methods with highly technical 
requirements. More recendy, DNA molecular techniques, espe- 
cially simple sequence repeat (SSR) markers, have been proved to 
be simple and highly effective methods for genetic analysis. A large 
number of SSR markers have been developed for Oryia sativa 
[15,16]. While some of the SSRs developed for Oryza sativa could 
be amplified from other AA genomes in the Oryza genus, they were 
not suitable for cross-amplifications from Oryza species with 
different genome types [17], as preceding cross-amplifications by 
Miscanthus sinensis [Poaceae) and its relative [18] and Narcissus 
papyraceus (Amarillydaceae) and its relatives [19]. Since there had 
being no sequence information available for the BBCC genome, 
no special markers have been developed for it. This made it 
difficult to explore the BBCC genome to find valuable genes, and 
to study the phylogenetic relationships among diverse members of 
the Oryza genus. 

Hence, the goal of this study was to develop the first set of 
microsateUite markers for the BBCC Oryza genome using next 
generation sequencing (NGS) technology. These microsateUite 
markers could be used to evaluate the phylogenetic relationships 
among different Oryza genomes, and to construct a genetic linkage 
map for locating and identifying valuable genes in the BBCC 
genome, and would also for marker-assisted breeding programs 
that include accessions with the AA genome, especially Oryza sativa. 

Materials and Methods 

Plant materials and DNA extraction 

We chose seven Oryza species including 48 accessions (Table SI) 
in this study, referring to different ploidy levels, genomic 



constitutions, and genome origins. 38 accessions were obtained 
from the Germplasm Resource Center of the International Rice 
Research Institute (Los Banos, Philippines), including 23 acces- 
sions with the BBCC genome, 1 with the BB genome, and 14 with 
the CC genome. The other 10 accessions of Oryza sativa were 
obtained from the National Mid-term Genebank for Rice 
(Hangzhou, China). 

Total genomic DNA was extracted from fresh leaves using the 
DNeasy Plant Mini Kit (Qjagen, Valencia, CA, USA). 

MicrosateUite loci search and SSR primer development 

Genome libraries were constructed from the accession W303 
[Oryza minuta) based on shotgun method, and then sequenced using 
the lUumina Hi Seq 2000 sequencer (lUumina Inc., San Diego, 
CA, USA). The genome of W303 (European Bioinformatics 
Institute; Accession number: PRJEB5091) was assembled using 
Phusion2 [20] and Phrap [21]. The N50 length of the entire 
assembly was calculated for the initial contigs with small contigs < 
1000 bp excluded. 

The SSRs were identified by the software MISA (MicrosateUite 
identification tool, http://pgrc.ipk-gatersleben.de/misa/). The 
primers for each unique SSR were designed using the Primer 
3.0 (http://sourceforge.net/projects/primer3/). The primer de- 
sign parameters were as foUows: length from 18 bp to 23 bp with 
21 bp as the optimum; annealing temperature between 55°C and 
63°C with 60°C as the optimum; GC content from 40% to 60% 
with 50% as the optimum; and PGR product size between 80 bp 
and 250 bp. 

SSR genotyping 

The PGR amplifications were carried out with a 2720 thermal 
cycler (Applied Biosystems, Foster City, CA, USA) in 10 |iL 
reaction mixtures. Each reaction contained 1.0 |iL 10 x buffer, 
1.0 |iL 2 mmol/L dNTPs, 1.0 |iL 25 mmol/L MgClj, 0.6 |iL 
each of forward and reverse primer (10 |imol/L), 0.1 |iL 5 U/)iL 
Taq polymerase, and 20 ng template DNA. The PGR cycling 



Table 2. Occurrence of the sequence analysis and microsatellites in the genome survey. 




Category 


Numbers 


Total number of sequences examined 


225,833 


Total size of examined sequences (bp) 


480,470,380 


Total number of identified SSRs 


16,197 


Number of SSR containing sequences 


13,991 


Number of sequences containing more than 1 SSR 


1,814 


Number of SSRs present in compound formation 


503 


doi:l 0.1 371 /journal.pone.0091 826.t002 
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A 




Figure 1. Frequencies of different classes of nucleotide repeats, 

doi:10.1371/journal.pone.0091826.g001 

profile was as follows: 94°C, 2 min; 35 cycles of 94°C, 30 s, 60°C 
with a increase/decrease of 1°C, 30 s, and 72°C, 1 min; and 
72°C, 8 min. The amplification products were analyzed by an 
Applied Biosystems 3130x1 DNA analyzer (Applied Biosystems), 
and the data were processed using GeneScan and GeneMapper 
software (Apphed Biosystems). 

Statistical analysis 

The average number of alleles per locus (JVa), the genetic 
diversity per locus {He), and the polymorphic information content 
(PIC) were calculated with the Powermarker Software [22]. AU 48 
accessions were clustered using the Neighbor-Joining (]MJ) tree 
implemented in the TreeView program [23] according to the jVffl' s 
unbiased genetic distance [24] with 100 bootstrap replications, 
using the Oryia saliva as an out-group. 

Results 

Data from sequencing and microsatellite loci detected 

As shown in Table 1, a total length of the assemble sequences > 
1000 bp was 480,470,380 bp (n = 225,883) (http://www.ricedata. 
cn/down/W303_fasta.rar). The average length of the read 
sequences was 2,128 bp, with a maximum length of 41,615 bp 
and no sequences shorter than 1,000 bp. 

In total, 16,197 SSR loci were identified with discrete repeats 
accounting for 97% and compound repeats (C* type and C type) 
accounting for only 3%. We obtained 13,991 SSR-containing 
sequences, and 1,814 sequences contained more than one SSR. 
There were 503 SSRs present in compound formation (Table 2). 
Finally, 14,508 primer pairs were designed. 

Distribution of identified microsatellite motifs and 
classified repeat types 

We set the following minimum length criteria in MISA to 
extract repeated units (unit size/minimum number of repeats): (1/ 
18), (2/9), (3/6), (4/5), (5/4), and (6/3). The SSR motif of 
hexanucleotide repeats (5,090, 31.4%) was the most abundant 
class, followed by trinucleotide (4,529, 28.0%), dinucleotide (3,131, 
19.3%), tetranucleotide (1,603, 9.9%), pentanucleotide (1,182, 
7.3%) and mononucleotide repeats (662, 4.1%) (Figure la); the 



B 




(A) 14508 primer pairs; (B) 600 selected primer pairs. 



SSR motif detected at the highest frequency in each class was 
ATCTTT, CGC, CT, TATG, AATCT, and G, respectively. The 
most abundant SSR repeat type in each class was AAAAAG/ 
CTTTTT (4.0%), AGG/CCT and CCG/CGG (16.3%), AG/CT 
(74.6%), ACAT/ATGT (13.7%), AGAGG/CCTCT (9.7%) and 
C/G (64.7%), respectively. 

Characterization of microsatellite markers for the BBCC 
genome 

We designed 14,508 primer pairs, and selected a set of 600 SSR 
markers based on proportional distribution (Figure 1). We tested 
the ability of the 600 primer sets to amplify SSRs from 23 
accessions with the BBCC genome. Of the 600 primer pairs, 50 
did not produce amplicons, probably because of mutations at the 
SSR locus. 55 did not amplily fragments of the expected size, 
probably because of In/Del mutations at the SSR locus. Of the 
remaining 495 microsatellite markers (Table S2, http://www. 
ricedata.cn/down/SSR_data.xlsx), 156 were monomorphic, and 
339 were polymorphic. There were 223 single copy and 272 multi- 
copy markers. The mean jVa value was 2.5 with a range from 1 to 
9. The He value varied from 0 to 0.844 with a mean of 0.333. The 
mean PIC was 0.290, and ranged from 0 to 0.825. Among these 
markers, 46 were unique to Oryia minuta, five were unique to Oiyza 
punctata, and none were specific to Oryza malampuzhaensis. The 
genetic diversity of Oryza minuta was lower than that of Oryza 
punctata (Table 3; Ma=lA vs. 1.4; //e= 0.093 vs. 0.125; PIC= 
0.081 vs. 0.102). 

Cross-amplification from other related genomes 

Next, we evaluated the suitability of these 495 markers for use in 
other closely related species. Of the 495 markers, only 12 (2.4%) 
were specific to the BB genome, 173 (34.9%) were specific to the 
CC genome, and 299 (60.4%) were common to the BB, CC, and 
BBCC genomes. Eleven markers (2.2%) were neither in the BB 
nor the CC genome. Most interestingly, 198 markers (40.0%) were 
also present in the AA genome. 

The phylogenetic tree (Figure 2) grouped the 48 accessions into 
two significant, distinct clusters. Cluster I consisted of the BB, CC, 
and BBCC genome species; and cluster II consisted of the AA 
genome species. Cluster I was further divided into two groups, one 
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Oryza officinalis 



Oryza eichingeri 



Oryza rhizomatis 



BBCC-14 

BBCC-21 
BBCC-20 
BBCC-22 
BBCC-19 
BB-01 □ 
BBCC-10 

BBCC-09 
BBCC-06 
BBCC-05 
BBCC-04 
BBCC-07 
BBCC-08 

BBCC-03 
BBCC-fl2 
BBCC-12 
BBCC-II 
BBCC-01 
BBCC-23 



Oryza punctata (2n=48) 



Oryza punctata (2n=24) 



Oryza minuta 



□ Oryza malampuzhaensis 



Oryza sativa (out-group) 



I 1 

0.0S 



Figure 2. Neighbor-Joining tree of 48 accessions based on Nefs unbiased genetic distance from 495 SSR mariners. Bootstrap values 

(out of 100) are indicated at the branch points. 

doi:10.1371/journal.pone.0091826.g002 



consisting of species with the BBCC and BB genomes, and tlie 
other consisting of species with the CC genome. Within the BBCC 
genome, Oryza minuta and Oryza punctata formed different 
subgroups. Oryza malampuzhaensis was more closely related to Oryza 
minuta than to Oiyza punctata. Among the species with the CC 
genome, Oryza eichingeri was more closely related to Oryza officinalis 
than to Oryza rhizomatis. In cluster II, Oryza sativa indica and Oryza 
sativa japonica were clearly divided into two groups. The groups in 
the NJ tree were consistent with the intrinsic relationships among 
Oryza species [17], and further confu-med the usefulness of the new 
developmental microsatellite markers in genetic analyses. 

Discussion 

We developed the first set of microsatellite markers for the 
BBCC 0?yza genome. The SSRs were located in both coding and 
non-coding regions, and therefore, they would be useful for genetic 
and evolutionary analyses, high-throughput mapping, and marker- 
assisted plant improvement strategies. In this study, 82.5% of 
selected markers produced clear amplified fragments of the 
expected sizes. This was similar to the success rate of 60-90% 



amplification reported elsewhere [25]. Among these markers, 12 
were specific to the BB genome and 173 were unique to the CC 
genome. Thus, these unique microsatellite markers could be 
developed as probes to identify different species and various 
genomes. We evaluated the transferability rates of the markers in 
different Oryza species. The transferability rate between Oryza 
minuta and Oryza punctata was 89.7%. This was higher than that for 
Oryza species with the BB, CC, and BBCC genomes (60.4%), and 
that between AA and BBCC genomes (40.0%). These high 
transferability rates suggested that dilferent species or genomes 
within the Oryza genus were closely related. 

Our results showed that hexanucleotide repeat motif (31.4%) 
was the most abundant repeat type, followed by trinucleotide 
(28.0%) and dinucleotide (19.3%). These findings differed from 
those of previous studies in which dinucleotide or trinucleotide 
repeats were reported to be the most abundant motifs in genomes 
of cultivated rice [16,26], and pentanucleotide repeats (30.5%) 
were the most abundant type in Gossypium raimondii [17]. The 
nature of the microsatellites obtained was related not only to the 
thresholds used to define the microsatellites, but also to genome 
organization, since heterogeneity could lead to differences in 
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microsatellite size [27]. The most common hexanucleotide motif 
was AAAAAG/CTTTTT (4.0%), wliich made up a much lower 
proportion than that of the most common motif in faba bean, 
ACACGC/CGTGTG (49.5%) [28]. The main trinucleotide 
repeats were AGG/CTT and CCG/CGG, representing 16.,S% 
of all of the trinucleotide repeats analyzed. The most common 
trinucleotide repeats were AGG/CTT in Amorphophallus [25], and 
CGG/GCC in cultivated rice [16,26]. These results provided 
further evidence that the CCG/CGG motif was very common in 
monocots [29]. This reflected the strong conservation of synteny 
among genomes of diverse monocots, and could result from a high 
GC content and codon bias [30,31]. 

In previous studies, mitochondrial restriction fragment length 
polymorphisms (RFLPs) [32] and inter simple sequence repeat 
(ISSR) [33] markers had been used to study genetic relationships 
among members of the Oryza genus. However, these analyses 
could only distinguish the AA genome from other types, and could 
not separate other related genomes, such as the BB, CC, and 
BBCC genomes. In contrast, the SSR markers developed from the 
BBCC genome were able to differentiate the AA, BB, CC, and 
BBCC genomes, and also distinguished the BB and CC genomes 
from the BBCC genome, even identified various species within the 
AA, CC, and BBCC genomes. Thus, the relationships predicted 
from analyses using these markers were consistent with the 
established evolutionary relationships among members of the 
Oryza genus [17]. Despite this, a new marker, SNP (Single 
Nucleotide Polymorphism), is now on the scene and has gained 
increasing popularity. In terms of genetic information provided, as 
simple bi-allelic co-dominant markers, they can be considered as a 
step backwards when compared to the highly informative multi- 
allelic microsatellites [34]. 

The NJ tree further revealed that the BB genome species were 
more closely related to species with the BBCC genome than to 
those with the CC genome, demonstrating that the BB genome 
was the maternal parent of the BBCC genome [35,36] and CC 
species evolved later [37]. Oryza malampuzhaensis and Oryza 
offwinalis, both of which had the BBCC genome, shared similar 
morphologies; in fact, Oryza malampuzhaensis was considered to be a 
subspecies of Oryza officinalis [38]. There were clear diflerences in 
the panicle and spikelet between these two species [14]. Our 
results showed that Oryza malampuzhaensis was more closely related 
to Oryza minuta than to Oryza officinalis, consistent with the fact that 
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